29 research outputs found
Confidence driven TGV fusion
We introduce a novel model for spatially varying variational data fusion,
driven by point-wise confidence values. The proposed model allows for the joint
estimation of the data and the confidence values based on the spatial coherence
of the data. We discuss the main properties of the introduced model as well as
suitable algorithms for estimating the solution of the corresponding biconvex
minimization problem and their convergence. The performance of the proposed
model is evaluated considering the problem of depth image fusion by using both
synthetic and real data from publicly available datasets
Discovery and recognition of motion primitives in human activities
We present a novel framework for the automatic discovery and recognition of
motion primitives in videos of human activities. Given the 3D pose of a human
in a video, human motion primitives are discovered by optimizing the `motion
flux', a quantity which captures the motion variation of a group of skeletal
joints. A normalization of the primitives is proposed in order to make them
invariant with respect to a subject anatomical variations and data sampling
rate. The discovered primitives are unknown and unlabeled and are
unsupervisedly collected into classes via a hierarchical non-parametric Bayes
mixture model. Once classes are determined and labeled they are further
analyzed for establishing models for recognizing discovered primitives. Each
primitive model is defined by a set of learned parameters.
Given new video data and given the estimated pose of the subject appearing on
the video, the motion is segmented into primitives, which are recognized with a
probability given according to the parameters of the learned models.
Using our framework we build a publicly available dataset of human motion
primitives, using sequences taken from well-known motion capture datasets. We
expect that our framework, by providing an objective way for discovering and
categorizing human motion, will be a useful tool in numerous research fields
including video analysis, human inspired motion generation, learning by
demonstration, intuitive human-robot interaction, and human behavior analysis
Inverse problem theory in shape and action modeling
In this thesis we consider shape and action modeling problems under the perspective of
inverse problem theory. Inverse problem theory proposes a mathematical framework for
solving model parameter estimation problems. Inverse problems are typically ill-posed,
which makes their solution challenging. Regularization theory and Bayesian statistical
methods, which are proposed in the context of inverse problem theory, provide suitable
methods for dealing with ill-posed problems.
Regarding the application of inverse problem theory in shape and action modeling,
we first discuss the problem of saliency prediction, considering a model proposed by the
coherence theory of attention. According to coherence theory, salience regions emerge
via proto-objects which we model using harmonic functions (thin-membranes). We also
discuss the modeling of the 3D scene, as it is fundamental for extracting suitable scene
features, which guide the generation of proto-objects.
The next application we consider is the problem of image fusion. In this context,
we propose a variational image fusion framework, based on confidence driven total
variation regularization, and we consider its application to the problem of depth image
fusion, which is an important step in the dense 3D scene reconstruction pipeline.
The third problem we encounter regards action modeling, and in particular the
recognition of human actions based on 3D data. Here, we employ a Bayesian nonparametric
model to capture the idiosyncratic motions of the different body parts. Recognition
is achieved by comparing the motion behaviors of the subject to a dictionary of
behaviors for each action, learned by examples collected from other subjects.
Next, we consider the 3D modeling of articulated objects from images taken from
the web, with application to the 3D modeling of animals. By decomposing the full
object in rigid components and by considering different aspects of these components,
we model the object up this hierarchy, in order to obtain a 3D model of the entire object.
Single view 3D modeling as well as model registration is performed, based on
regularization methods.
The last problem we consider, is the modeling of 3D specular (non-Lambertian)
surfaces from a single image. To solve this challenging problem we propose a Bayesian
non-parametric model for estimating the normal field of the surface from its appearance,
by identifying the material of the surface. After computing an initial model of the
surface, we apply regularization of its normal field considering also a photo-consistency
constraint, in order to estimate the final shape of the surface.
Finally, we conclude this thesis by summarizing the most significant results and
by suggesting future directions regarding the application of inverse problem theory to
challenging computer vision problems, as the ones encountered in this work
Point Cloud Structural Parts Extraction based on Segmentation Energy Minimization
In this work we consider 3D point sets, which in a typical setting represent unorganized point clouds. Segmentation of these point sets requires first to single out structural components of the unknown surface discretely approximated by the point cloud. Structural components, in turn, are surface patches approximating unknown parts of elementary geometric structures, such as planes, ellipsoids, spheres and so on. The approach used is based on level set methods computing the moving front of the surface and tracing the interfaces between different parts of it. Level set methods are widely recognized to be one of the most efficient methods to segment both 2D images and 3D medical images. Level set methods for 3D segmentation have recently received an increasing interest. We contribute by proposing a novel approach for raw point sets. Based on the motion and distance functions of the level set we introduce four energy minimization models, which are used for segmentation, by considering an equal number of distance functions specified by geometric features. Finally we evaluate the proposed algorithm on point sets simulating unorganized point clouds
Saliency prediction in the coherence theory of attention
AbstractIn the coherence theory of attention, introduced by Rensink, O'Regan, and Clark (2000), a coherence field is defined by a hierarchy of structures supporting the activities taking place across the different stages of visual attention. At the interface between low level and mid-level attention processing stages are the proto-objects; these are generated in parallel and collect features of the scene at specific location and time. These structures fade away if the region is no further attended by attention. We introduce a method to computationally model these structures. Our model is based experimentally on data collected in dynamic 3D environments via the Gaze Machine, a gaze measurement framework. This framework allows to record pupil motion at the required speed and projects the point of regard in the 3D space (Pirri, Pizzoli, & Rudi, 2011; Pizzoli, Rigato, Shabani, & Pirri, 2011). To generate proto-objects the model is extended to vibrating circular membranes whose initial displacement is generated by the features that have been selected by classification. The energy of the vibrating membranes is used to predict saliency in visual search tasks
Rigid tool affordance matching points of regard
In this abstract we briefly introduce the analysis of simple rigid object affordance by experimentally establishing the relation between the point of regard of subjects before grasping an object and the finger tip points of contact once the object is grasped. The analysis show that there is a strong relation between these data, in so justifying the hypothesis that people figures out how objects are afforded according to their functionality
Bayesian non-parametric inference for manifold based MoCap representation
We propose a novel approach to human action recognition, with motion capture data (MoCap), based on grouping sub-body parts. By representing configurations of actions as manifolds, joint positions are mapped on a subspace via principal geodesic analysis. The reduced space is still highly informative and allows for classification based
on a non-parametric Bayesian approach, generating behaviors for each sub-body part. Having partitioned the set of joints, poses relative to a sub-body part are exchangeable,
given a specified prior and can elicit, in principle, infinite behaviors. The generation of these behaviors is specified by a Dirichlet process mixture. We show with several experiments
that the recognition gives very promising results, outperforming methods requiring temporal alignment
Component-wise modeling of articulated objects
We introduce a novel framework for modeling articulated objects based on the aspects of their components. By decomposing the object into components, we divide the problem in smaller modeling tasks. After obtaining 3D models for each component aspect by employing a shape deformation paradigm, we merge them together, forming the object components. The final model is obtained by assembling the components using an optimization scheme which fits the respective 3D models to the corresponding apparent contours in a reference pose. The results suggest that our approach can produce realistic 3D models of articulated objects in reasonable time